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Abstract 

In this paper, we consider high-speed wireless packet access using code division multiple access 
(CDMA) and multiple-input-multiple-output (MIMO). Current wireless standards, such as high speed packet 
access (HSPA), have adopted multi-code transmission and hybrid-automatic repeat request (ARQ) as major 
technologies for delivering high data rates. The key technique in hybrid-ARQ, is that erroneous data packets 
are kept in the receiver to detect/decode retransmitted ones. This strategy is refereed to as packet combining. 
In CDMA MIMO-based wireless packet access, multi-code transmission suffers from severe performance 
degradation due to the loss of code orthogonality caused by both interchip interference (ICI) and co-antenna 
interference (CAI). This limitation results in large transmission delays when an ARQ mechanism is used 
in the link layer In this paper, we investigate efficient minimum mean square error (MMSE) frequency 
domain equalization (FDE)-based iterative (turbo) packet combining for cyclic prefix (CP)-CDMA MIMO 
with Chase-type ARQ. We introduce two turbo packet combining schemes: i) In the first scheme, namely 
"chip-level turbo packet combining" , MMSE FDE and packet combining are jointly performed at the chip- 
level, ii) In the second scheme, namely "symbol-level turbo packet combining" , chip-level MMSE FDE and 
despreading are separately carried out for each transmission, then packet combining is performed at the level 
of the soft demapper The computational complexity and memory requirements of both techniques are quite 
insensitive to the ARQ delay, i.e., maximum number of ARQ rounds. The throughput is evaluated for some 
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representative antenna configurations and load factors (i.e., number of orthogonal codes with respect to the 
spreading factor) to show the gains offered by the proposed techniques. 

Index Terms 

Code division multiple access (CDMA), multi-code transmission, broadband multiple-input-multiple- 
output (MIMO), automatic repeat request (ARQ), packet combining, frequency domain methods. 

I. Introduction 

Space-time (ST) multiplexing oriented multiple-input-multiple-output (MIMO) and hybrid- 
automatic repeat request (ARQ) are two core technologies used in the emerging code division 
multiple access (CDMA)-based wireless packet access standards [1]. In ST multiplexing architectures, 
independent data streams are sent over multiple antennas to increase the transmission rate [2]. In 
hybrid-ARQ, erroneous data packets are kept in the receiver to help decode the retransmitted packet, 
using packet combining techniques (e.g. see [3] and references therein). 

To support heterogeneous data rates in CDMA systems, multiple spreading codes can simultane- 
ously be allocated to the same user if he requests a high data rate [4]. This method is often refereed to 
as "multi-code transmission," and has been considered in the high speed packet access (HSPA) system 
[5]. In MIMO CDMA systems, multi-code transmission offers a spectrum efficiency that linearly 
increases in the order of the number of spreading codes and transmit antennas. This is achieved 
by assigning the same spreading code group to all transmit antennas. However, in severe frequency 
selective fading wireless channels, the performance of this scheme can dramatically deteriorate due to 
co-antenna interference (CAI) and inter-chip interference (ICI). This results in a large delay (due to 
multiple transmissions) when an ARQ protocol is used in the Unk layer. Motivated by this limitation, 
we investigate efficient hybrid-ARQ receiver schemes that allow to reduce the number of ARQ 
rounds required to correctly decode a data packet in MIMO CDMA ARQ systems with multi-code 
transmission. 

Recently, cyclic-prefix (CP) aided single carrier (SC) CDMA transmission with chip-level minimum 
mean square error (MMSE)-based frequency domain equalization (FDE) has been introduced [6]. It 
is a transceiver scheme that allows to achieve attractive performance with affordable computational 
complexity cost. Turbo MMSE-FDE for CP-CDMA has then been proposed to cope with severe ICI 
[7]. In [8], MMSE FDE has been applied to perform packet combining for multi-code CP-CDMA 
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systems with ARQ operating over severe frequency selective fading channels. It has recently been 
demonstrated that ARQ presents an important source of diversity in MIMO systems [9]. Interestingly, 
it has been shown in [9] that for both short and long-term static ^ ARQ channel dynamics, multiple 
transmissions improve the diversity order of the corresponding MIMO ARQ channel. The case of 
block-fading MIMO ARQ, i.e., multiple fading blocks are observed within the same ARQ round, has 
been reported in [10]. Information rates and turbo MMSE packet combining strategies for frequency 
selective fading MIMO ARQ channel have been investigated in [1 1]. Turbo MMSE packet combining 
for broadband MIMO ARQ systems with co-channel interference (CCI) has recently been reported 
in [12] and [13] using time and frequency domain combining methods, respectively. 

In this paper, we consider Chase-type ARQ with multi-code CP-CDMA MIMO transmission ^ over 
broadband wireless channel. We propose two iterative (turbo) packet combining schemes where, at 
each ARQ round, the data packet is decoded by iteratively exchanging soft information in the form of 
log-likelihood ratios (LLRs) between the soft-input-soft-output (SISO) packet combiner and the SISO 
decoder. In the first turbo packet combining scheme, we exploit the fact that both the CP chip- word and 
data packet are retransmitted at each ARQ round. This allows us to view each transmission as a group 
of virtual receive antennas, and build up a virtual MIMO channel that takes into account both multi- 
antenna and multi-round transmission. We therefore perform combining of multiple transmissions 
jointly with chip-level soft MMSE FDE. This scheme is called chip-level packet combining. In the 
second scheme, both chip-level soft MMSE FDE and despreading are separately carried out for each 
transmission. Combining is then performed at the level of the soft symbol demapper. We analyze both 
the computational complexity and memory required by the proposed techniques, and show that they 
are less sensitive to the ARQ delay, i.e., maximum number of ARQ rounds. Finally, we evaluate and 
compare the throughput performance of the proposed schemes for some representative load factors 
(i.e., number of parallel codes with respect to the spreading factor) and antenna configurations. 

Throughout this paper, (.)^ and (.)^ denote the transpose and transpose conjugate of the argument, 
respectively. diag{x} e (C"X" and diag{Xi, • ■ ■ , Xm} G C™"^^™"^ denote the diagonal matrix and 
block diagonal matrix constructed from x e C"^ and Xi, • • • ,X^ e £nixn2^ respectively. For x e 

'The short-term static ARQ channel dynamic corresponds to the case where two consecutive ARQ rounds observe independent 
channel realizations. In long-term static channels, all ARQ rounds corresponding to the same data packet observe the same channel 
realization. 

^In this MIMO CDMA ARQ transmission scheme, the chip packet is completely retransmitted at each ARQ round. 



4 



C , X/ denotes the discrete Fourier transform (DFT) of x, i.e. x/ = Ut.atx, with 'Ut,n = Ut^Iat, 
where Ijv is the iV x iV identity matrix, is a unitary T xT matrix whose (m, n)th element is 
(Ur)j„ n ~ -^e""-'^^'^"*"/^), j — v^^, and (8) denotes the Kronecker product. The rest of this paper has 
the following structure. In Section n, we present the CP-CDMA MIMO ARQ transmission scheme 
then provide its corresponding communication model. In Section HI, we derive the two iterative soft 
MMSE FDE-aided packet combining schemes we propose in this paper. Section IV, analyzes the 
complexity and memory size required by both schemes, then focuses on the comparison of their 
throughput performances. The paper is concluded in Section V. 

II. System Description 
A. CP-CDMA MIMO ARQ Transmission Scheme 

We consider a single user multi-code CP-CDMA transmission scheme over a broadband MIMO 
channel with an ARQ protocol in the upper layer, where the ARQ delay is K (index = 1, ■ ■ ■ , K). 
An information block is first encoded using a p-rate encoder, then interleaved with the aid of a semi- 
random interleaver 11, and spatially multiplexed over Nt transmit antennas (index t — 1, - ■ ■ , Nt) to 
produce the coded and interleaved frame b which is serial-to-parallel converted to Nt sub-streams 
bi,..., bNj. , where 

bt = [htfl,i, ■■■ , btj,m, ■■■ , bt,%-i,M] ^ {0' l}^"^^ ■ (1) 

Tg denotes the length of the symbol block transmitted over each antenna (index j = 0, • • • , — 1). 
Each sub-stream is then symbol mapped onto the elements of constellation S where \S\ = 2^. 
For each antenna, the symbol block is passed through a serial-to-parallel converter and a spreading 
module which consists in C orthogonal codes. The same spreading matrix 

W ^ [w7, ■ ■ ■ , wj] e [±l/^y' (2) 
is used for each transmit antenna, where 

Wn = [Wi^n, • • • , WN,n\ , n = 1, • • • , C, (3) 

is a Walsh code of length N (i.e., spreading factor), and C < A'^ is the number of multiplexed codes. 
The rate of this space-time code (STC) is therefore 

R = pMNtC. (4) 
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The C parallel chip-streams on each antenna are then added together to construct a block of Tc = Tg^ 
chips (index i = 0, • • • , Tc — 1). The chips at the output of the Nt transmit antennas are arranged in 
the Nt x Tc matrix 



Xl,Q 

XNt,0 




(5) 



where 



c 



St,n,iWp,n, iraodN + 1, 



(6) 



n=l 



and st^n,i denotes the symbol transmitted by antenna t at channel use (c.u) i using Walsh code w„. 
Transmitted chips are independent (infinitely deep interleaving assumption), and the chip energy is 
normalized to one, i.e., E = 1 . A CP chip-word of length Tcp is appended to X to construct 

the Nt x (Tc -|- Tcp) chip matrix X' to be transmitted. We consider Chase-type ARQ: When the 
decoding outcome is erroneous at ARQ round k, the receiver feeds back a negative acknowledgment 
(NACK) message, then the transmitter completely retransmits chip-matrix X' in the next round. 
A successful decoding incurs the feed back of a positive acknowledgment (ACK) message. The 
transmitter then stops the transmission of the current frame and moves on to the next frame. Fig. 1 
depicts the considered CP-CDMA MIMO transmission scheme with ACK/NACK. 

B. Communication Model 

The broadband MIMO propagation channel connecting the Nt transmit and the Nji receive antennas 
is composed of L chip-spaced taps (index I — 0, - ■ ■ , L — 1). We assume a quasi-static block fading 
channel, i.e., the channel is constant over an information block and independently changes from block 
to block. The Np x A^^- channel matrix characterizing the Ith discrete tap at ARQ round k is denoted 
'ti^''\ and is made of zero-mean circularly symmetric complex Gaussian random entries. The average 



channel energy per receive antenna is normalized as 



L-l Nt 
1=0 t=i 



r,t,l 



Nt, r^l,--- ,Nr, 



(7) 



where h^^li is the (r, t)th element of Hj . 

At the receiver side, after removing the CP-word at ARQ round k, a DFT is applied on received 
signals. This yields Tc frequency domain components grouped in block 



r(fc) 



which can be expressed as. 
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(fe) 



group the DFTs of transmitted chips and thermal noise at round k, respectively, and 
J\f (0, a^lT^Nu)- The channel frequency response (CFR) matrix A^'^^ at ARQ round k is given by 



A«Adiag{A«,...,Ag)_,}, 

^(fc) ^ ^^^jHf'^e"^'(^''*'/^'=). 



(12) 



III. Iterative Receivers for CP-CDMA MIMO ARQ 

In this section, we present two efficient algorithms for performing turbo packet combining for 
CP-CDMA MIMO ARQ systems : i) chip-level turbo packet combining, and ii) symbol-level turbo 
packet combining. In both schemes, signals received in multiple ARQ rounds are processed using soft 
MMSE FDE. Transmitted data blocks are decoded, at each ARQ round, in an iterative fashion through 
the exchange of soft information, in the form of LLR values, between the soft packet combiner, i.e., 
soft-over ARQ rounds equalizer and demapper, and SISO decoder. 



A. Chip-Level Turbo Packet Combining 

(1) (k) 

To exploit the diversity available in received signals y^^ , • • • ,y}j, we view each ARQ round 
A; as an additional group of virtual Nr receive antennas. The MIMO ARQ system can therefore be 
considered as a point-to-point MIMO link with Nt transmit and kNji receive antennas, where the 
TckNji X 1 chip-level virtual received signal vector y^^^ is constructed as. 
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(13) 



The frequency domain communication model after k rounds is then given as, 

y(^)=AWx;+nf, 



(14) 
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Soft ICI cancellation and frequency domain MMSE filtering are jointly performed over all ARQ 
rounds. We call this concept "chip-level turbo packet combining". This requires a huge computational 
cost since the complexity of computing MMSE filters is cubic in the order of the ARQ delay. In 
addition, the required receiver memory size linearly scales with the ARQ delay because all CFRs 
Aq^'', ■ ■ ■ , A^Li are required at round k [14]. In the following, we introduce an efficient turbo MMSE 
implementation algorithm for chip-level combining where both receiver complexity and memory 
requirements are quite insensitive to the ARQ delay. 

Let X and cr^ ^ denote the conditional mean and variance of x and xt,i, respectively. Soft MMSE 
processing can be written in a compact forward-backward filtering structure as in [15]. By using the 
matrix inversion lemma [16], we can express soft MMSE chip-level packet combining at round k as. 



(17) 



where T^*^) = diag {r^'V • • , rg^} e C^cNrx^N^^ ^nd Q^'^ = diag . . . , flgli} e 

^TcNtxTcNt (denote the forward and backward filters at round k, respectively, and are given by. 



■Nt 



{k)r^(k) 
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(k) 



(J^H +D, 



(fe) 



(18) 



(19) 

i=0 

H is the A^T x Nt unconditional co variance of transmitted chips, and is computed as the time average 
of conditional co variance matrices Sj = diag [crfi, • • • , clry^j}- Variables y^'^-* and 'Of'^ are computed 
according to the following recursions, 

-f -f ' (20) 

(21) 



n(°) - n 



Note that recursions (20) and (21) present an important ingredient in the proposed chip-level combining 

algorithm since both complexity and memory requirements become less sensitive to the ARQ delay. 

(k) 

These issues are discussed in detail in Section IV. The inverse DFT (IDFT) is then applied to Zj 
to obtain the equalized time domain chip sequence. After despreading, extrinsic LLR value (t>t,j,m 
corresponding to coded and interleaved bit fctj,m V j, m is computed as. 



,(e) , ^e5r I m'^m ) 

= log 7 Y' ^ ^ 

5^ exp 

where ^[^(.s) = ^^yf"^ , with r|'^-', g^'^j, and 6^'^^ are the despreading module output, gain, and 
residual interference variance, respectively, ^j"^^/ denotes a-priori LLR value corresponding to &t,j,m'- 
^m' {s} is an operator that allows to extract the m'th bit labeling symbol s e S, and is the set of 
symbols where the mth bit is equal to P, i.e. — {s : Xm {s} — P}. The obtained extrinsic LLR 
values are de-interleaved and fed to the SISO decoder. The proposed low complexity algorithm is 
summarized in Table I. 
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B. Symbol-Level Turbo Packet Combining 

In this combining scheme, the receiver performs chip-level space-time frequency domain equaliza- 
tion separately for each ARQ round, then combines multiple transmissions at the level of the soft 
demapper. At each iteration of ARQ round k, soft ICI cancellation and MMSE filtering are performed 
similarly to (17) using communication model (9). Extrinsic information is computed using despreading 
module outputs corresponding to all ARQ rounds. This requires the inversion of the kxk covariance 
matrix of residual interference plus noise. By observing that despreading module outputs obtained 
at different transmissions are independent, extrinsic LLR value corresponding to coded and 

interleaved bit &t j,m can be expressed as. 



Note that this recursive implementation relaxes both the complexity and memory requirements. The 
proposed low complexity algorithm is summarized in Table II. 



A. Complexity Evaluation 

In this subsection, we briefly analyze both the computational cost and memory requirements of the 
proposed packet combining schemes. First, note that both algorithms have identical implementations. 
The only difference comes from steps Table. I. 1.1., and Table. II. 1.1.3. Therefore, both techniques 
approximately have the same implementation cost. In the following, we focus on the number of 
arithmetic additions and memory required to perform recursions (20), (21), and (24). 

The main idea in the proposed algorithms is to exploit the diversity available in multiple 
transmissions without explicitly storing required soft channel outputs (i.e., signals and CFRs) or 




(23) 



where £^ ^ (s) is recursively computed according to the following recursion. 




(24) 



IV. Complexity and Performance Analysis 
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decisions (i.e., filter outputs), corresponding to all ARQ rounds. This is performed with the aid of 
recursions (20), (21), and (24), and translates into a memory requirement of 2TcNt {Nt + 1) and 
TsNt2^ real values for chip-level and symbol-level turbo combining, respectively. Note that in both 
schemes, the required memory size is insensitive to the ARQ delay. The number of rounds only 
influences the number of arithmetic additions required in the update procedures corresponding to 
recursions (20), (21), and (24). At each ARQ round, the chip-level turbo combining algorithm involves 
2TcNt{Nt + 1) arithmetic additions to update y^'^^ and 'Df^\ The symbol-level turbo combining 
scheme requires TsNxNuer^^ arithmetic additions to update (s) at each round, where A^iter denotes 
the number of turbo iterations. 

Table HI summarizes the maximum number of arithmetic additions and memory size required 
by both schemes. Note that the number of additions does not have a great impact on receiver 
computational complexity. The required memory size is the major implementation constraint to take 
into account when choosing between chip-level and symbol-level combining. In the case of low-order 
modulations (i.e., M < 2), symbol-level has less memory requirements than chip-level combining 
independently of the spreading factor A^, number of codes C, and number of transmit antennas Nt- 
For high-order modulations, (i.e., M > 3), the required memory size mainly depends on system 
parameters. For instance, when M — A, Nt = 4, and the system is fully loaded, (i.e., N — C), 
chip-level combining offers less memory requirements than symbol-level combining. When the load 
factor is reduced to 50%, (i.e., ^ = |), symbol-level becomes more attractive than chip-level. 

B. Performance Evaluation 

In this subsection, we evaluate the throughput performance of the proposed CP-CDMA MIMO 
ARQ turbo combining schemes. Following [17], we define the throughput as 77 = ||^, where 7?. is a 
random variable (RV) that takes R when the packet is correctly received or zero when the packet is 
erroneous after K ARQ rounds. /C is a RV that denotes the number of rounds used for transmitting 
one data packet. We use Monte Carlo simulations for evaluating r]. 

We consider a STC using a |-rate convolutional encoder with polynomial generators (35, 23)g, 
quadrature phase shift keying (QPSK) modulation, Nt — 2 transmit antennas, and a spreading factor 
N = The length of the code bit frame is 1024 bits including tails. We evaluate the throughput 
performance for the following loads: 25% (i.e., C = 4), 50% (i.e., C = 8), and 100% (i.e., C = 16), 
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which correspond to rates R = 8, R = 16, and R = 32, respectively. The ARQ delay is K = 3. The 
broadband MIMO channel has L — 10 chip-spaced equal power taps, and the CP length is Tcp — 10. 
The Ec/Nq ratio appearing in all figures is the signal to noise ratio (SNR) per chip per receive 
antenna. We use Max-Log-maximum a posteriori (MAP) for SISO decoding. The number of turbo 
iterations is set to three. In all scenarios, we consider the matched filter bound (MFB) throughput 
performance of the corresponding CP-CDMA MIMO ARQ channel to evaluate the ICI cancellation 
capability achieved by the proposed techniques. 

In Fig. 2, we report throughput performance curves for a balanced MIMO configuration, i.e., 
Nji — Nt — 2. We observe that both combining schemes have similar throughput performance 
for quarter and half loads. In the case of full load, chip-level combining outperforms symbol-level 
combining in the region of low SNR. For instance, the performance gap is around 0.6dB at r] = 
12.5bit/s/Hz throughput. Also, note that for all configurations, the slopes of the throughput curves of 
both techniques are asymptotically similar to that of the MFB. Therefore, both combining schemes 
asymptotically achieve the diversity order of the corresponding CP-CDMA MIMO ARQ channel. 

In Fig. 3, we provide throughput curves when only one receive antenna (Nr = 1) is used, i.e., 
unbalanced MIMO configuration. In this scenario, chip-level combining clearly outperforms symbol- 
level combining for half and full loads. The performance gap is about 3dB at r] — 12.5bit/s/Hz for 
a full load configuration. This suggests that chip-level turbo combining can be used for high speed 
downlink CDMA MIMO systems with high loads. Note that, both techniques fail to achieve the full 
diversity order in the case of half and full loads. 

V. Conclusions 

In this paper, efficient turbo receiver schemes for multi-code CP-CDMA transmission with ARQ 
operating over broadband MIMO channel were investigated. Two packet combining algorithms were 
introduced. The chip-level technique performs packet combining jointly with chip-level MMSE FDE. 
The symbol-level scheme combines multiple transmissions at the level of the soft demapper. We 
analyzed the complexity and memory size required by both techniques, and showed that, from an 
implementation point of view, chip-level is more attractive than symbol-level combining for systems 
with high modulation order and load factor (number of codes with respect to the spreading factor). 
We also investigated the throughput performance. Simulations demonstrated that both techniques 
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approximately have similar performance for balanced MIMO configurations. In the case of unbalanced 
configurations (more transmit than receive antennas), chip-level combining outperforms symbol-level 
combining especially for full load factors. 
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Fig. 1. CP-CDMA MIMO transmission sclieme witli ACK/NACK. 



TABLE I 

Summary Of The Chip-Level Turbo Combining Algorithm 



0. Initialization 

InitiaUze y^p and Df ^ with Or, Ntxi and OjvtxjVt> respectively. 

1. Combining at round k 

1.1. Update and Of ^ according to (20) and (21). 

1.2. At each iteration, 

1.2.1 Compute the forward and backward filters using (18) and (19). 

1.2.2 Compute the MMSE estimate of Xf using (17). 

(e) 

1.2.3 Compute extrinsic LLRs (l>tj,m according to (22). 

1.3. end 1.2. 
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TABLE n 

Summary Of The Symbol-Level Turbo Combining Algorithm 



0. Initialization: 

Initialize j (s) with 0. 

1. Combining at round k 
1.1. At each iteration, 

(k) (k)^ ( 

1.1.1 Compute the forward and backward filters using (18) and (19) with D^- = A- A- 

1.1.2 Compute the MMSE estimate on x/ using (17) and y^''^ = A^'^^^y^''^ 

—(k) 

1.1.3 Update ■ (s) according to (24). 

(e) 

1.1.4 Compute extrinsic LLRs (plj^^ using (23). 
1.3. end 1.1. 



TABLE in 

Summary of the Maximum Number of Arithmetic Additions, and Memory Size 

Chip-Level Combining Symbol-Level Combining 



Arithmetic Additions 


2TcNt {K - 1) {Nt + 1) 


T,Nt {K - 1) Ariter2*f 


Memory 


2TcNt {Nt + 1) 
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Fig. 2. Throughput performance with Nt = 2, Nr = 2, L = 10 equal power tap profile. 
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Fig. 3. Throughput performance with Nt = 2, Nr = 1, L = 10 equal power tap profile. 



